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Compressed Sensing Recovery 
via Nonconvex Shrinkage Penalties 

Joseph Woodworth, Rick Chartrand Senior Member, IEEE 


Abstract 

The £° minimization of compressed sensing is often relaxed to t 1 , which yields easy computation 
using the shrinkage mapping known as soft thresholding, and can be shown to recover the original 
solution under certain hypotheses. Recent work has derived a general class of shrinkages and associated 
nonconvex penalties that better approximate the original penalty and empirically can recover the 
original solution from fewer measurements. We specifically examine p-shrinkage and firm thresholding. 
In this work, we prove that given data and a measurement matrix from a broad class of matrices, one can 
choose parameters for these classes of shrinkages to guarantee exact recovery of the sparsest solution. 
We further prove convergence of the algorithm iterative p-shrinkage (IPS) for solving one such relaxed 
problem. 


Index Terms 

compressed sensing, nonconvexity, relaxation, exact recovery, stability, convergence 


I. Introduction 

Compressed sensing has been successfully applied in a multitude of scientific fields, ranging 
from image processing tasks to radar to coding theory, making the potential impact of advance¬ 
ments in theory and practice rather large. Compressed sensing methods rely on the notion of 
sparsity, which is primarily approximated via the I 1 norm [1], [2]. The nature and limitations 
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of this relaxation have been well-studied 0-0, as well as some alternative relaxations, 
such as the P quasinorm (5), [ 10]-[20[. The nonconvex P quasinorm approaches present a 
tradeoff: closer approximation of sparsity for harder analysis and computation. Recent work has 


introduced generalized nonconvex penalties [21 ]—[271 that have thus far demonstrated strong 
empirical performance [0, 


|, [28]. In this paper, we prove conditions that guarantee 
good performance of these generalized penalties. 


A. Compressed Sensing 

Compressed sensing seeks to represent a signal from a small number of linear measurements. 
We let the vector represent the original signal. The linear measurements are the result of 

an application of the short and fat measurement matrix A E R mxn , with m <P n. One is given 
the measurements b := Ax and wants to recover x. Of course m <C n implies that Ax = b is an 
underdetermined linear system in x, so additional assumptions must be made about x. Thus one 
assumes that x is sparse, meaning that it has few nonzero entries. By considering the standard 
definition of p norms for vectors, 

IHIp : = l^l 35 ’ (!) 

i 

and taking the limit as p approaches 0 from above, we get the P penalty, ||uj| 0 , which counts 
the number of nonzero entries of w. One would like to find the sparsest vector w E W 1 whose 
measurements are b, which suggests the following optimization problem: 

min \\w\\ 0 subject to Aw = b. (2) 

W 

Unfortunately, this problem is known to be NP-hard (Non-deterministic Polynomial-time hard) 
in general (29] Sec. 9.2.2]. In other words, without making further assumptions on A and x, 
an algorithm solving this problem would be computationally intractable. For this reason, one 
relaxes the problem, replacing the P penalty with other penalties. 

B. P relaxation 

The P relaxed version of the compressed sensing problem is as follows: 

min ||in||i subject to Aw = b. (3) 
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In contrast to the combinatorial £° problem, this problem minimizes a convex energy subject to 
linear constraints, and can be recast as a linear program. Extensive theory has been developed 


to study the properties of solutions to convex problems [30J. Further, a subproblem related to 
the £} relaxation of compressed sensing has a closed-form solution, given by an application of 
a shrinkage operator: 


Definition 1.1. Soft thresholding is given by the following formula: 

S\,i(x)i = sx.id^il) sign(xj) = max{|xj| - A, 0} sign (a:*). 
The role soft thresholding plays is as the proximal mapping of the £} norm: 

S\,i(x) = prox A || • ||i(x) := argmin A||u>||i + ^||w — x\\l- 


(4) 


(5) 


Several algorithms for compressed sensing make use of this proximal mapping, such as iterative 
soft thresholding |3TJ, alternating direction method of multipliers (ADMM) [321—[351, and the 


Chambolle-Pock algorithm [36]. The explicit formula for (|5]) makes the use of £} regularization 
particularly convenient. 

All of this suggests why the £ l relaxation of compressed sensing is nice to solve, but does 
not motivate it as the right problem to solve. In particular, one is interested in conditions under 
which the solution to the i 1 relaxation <(3]) of compressed sensing equals or approximately equals 
the solution of the original £° compressed sensing problem ([2]). The papers [ 1 ], [2| developed 
theory for the recovery of the £° solution by the £} problem. In the years the followed, getting 
looser conditions for exact £} recovery received continuing interest [3]— [|TT|, {j6|. One type of 
condition for recovery of the £° solution from the £} problem relies on the restricted isometry 
constants associated with the measurement matrix A. The restricted isometry constant of order 
k associated with the matrix A e M mXn is the smallest 8 k > 0 such that the following holds for 
all x e W 1 with ||x||o < k [37]: 


(! - 4)IM|2 < \\Ax\\l < (! + 


( 6 ) 


Note that when 8 k > 1 the lower bound becomes trivial and the upper bound can be improved by 
rescaling A. Thus any measurement matrix, with appropriate rescaling, can achieve 8k = 1, so 
one typically only regards 5k G [0,1). One of the best current £ l recovery results states that for 
sufficiently large n, a sparse vector x 6 M n with |x|| 0 = k can be recovered by £ l minimization 
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as long as k < m/2 and the restricted isometry constant of order 2k associated with A satisfies 

S 2k < 1/2 0. 


C. £ p relaxations (0 < p < 1) 

A similar relaxation of the £ 0 problem that achieves recovery results in broader cases is i v 
minimization for 0 < p < 1. In contrast to the £} norm, the £ p quasinorms for 0 < p <1 arc 
not convex. Hence much of the theory of convex analysis no longer applies, making solution 
uniqueness and convergence results more complicated. However, the loss of convexity comes 
with the benefit that £ p is better able to approximate the original £° than £ l can. As a result, 
one can show that for any given measurement matrix with restricted isometry constant 82 k < 1, 
there exists some p £ (0,1) that will guarantee exact recovery of signals with support smaller 


than k < m/2 by the £ p minimization problem [13]. It has also been demonstrated empirically 
that £ p minimization gives better sparse recovery results than £} minimization [38]—[40], with 


improved robustness [14], [18], 191. 

Consider the proximal mapping of the £ p quasinorm (to the p th power, for simplicity), that is, 


prox A || • \\p(x) := argmmA||U7||p + ^\\w — (7) 

W 

Unfortunately, Q is a discontinuous mapping pT|, and there is no closed-form expression for 


0 for general p. (The expression given in [42] is incorrect. For the special cases of p = 1/2 
or 2/3, the proximal mapping can be expressed in terms of the solution of a cubic or quartic 
equation, explicitly but cumbersomely.) This prevents several efficient algorithms from being 
generalized from l x to £ p minimization. 


D. Generalized shrinkage 

The need for an explicit proximal mapping motivates the approach of specifying a shrinkage 
mapping, and minimizing an implicitly-defined penalty function whose proximal mapping is 
the specified shrinkage pT][-[;23], [27|. In this work, we extend theoretical results for recovery 
of sparse signals to the case of penalty functions induced by two families of shrinkages, p- 
shrinkage and firm thresholding (see Defs. mm below). In Section [D] we describe these 
shrinkage mappings, and how they are the proximal mappings of nonconvex penalty functions. 


In Section III we prove conditions for the exact recovery of sparse signals via minimizing such 
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nonconvex penalty functions. In Section IV we demonstrate the stability of signal recovery to 
noisy measurements and approximately sparse signals, and in Section [V] we show the algorithmic 
convergence of iterative p-shrinkage (IPS). 


II. Generalized shrinkage penalties 

As described above, nonconvex penalty functions have been shown both theoretically and 
empirically to give better results for compressed sensing than the l 1 norm. In order to make 
use of any of several efficient algorithms, we wish to consider penalty functions with explicit 
proximal mappings. In this section, we consider two such families of functions. 


A. p-shrinkage and firm thresholding 

First we consider a shrinkage mapping, a version of which first appeared in [21], that has 
some qualitative resemblance to the i v proximal mapping, while being continuous and explicit: 

Definition II. 1. For A > 0, the p-shrinkage mapping S p = S\ tP for p e E is defined by 
S p (x)i = s p (\xi\) sign(Tj), where the shrinkage function s p = s\ tP is defined by 

s p (t ) = max{f — A 2_p f p_1 , 0}. (8) 

See Fig. |T] for example plots. When p — 1, p-shrinkage and soft thresholding coincide. The 
smaller the value of p, the less p-shrinkage shrinks large inputs. In the limit as p — * —oo, 
p-shrinkage tends pointwise to hard thresholding : 


Definition II.2. For A > 0, the hard thresholding mapping II\ is defined by 

Hxixfi = 


f° 

if \xi\ 


if \xi\ 


(9) 


Hard thresholding is related to the proximal mapping of the £° penalty function: 


H V2\ e P^XaII ' IIo, 


( 10 ) 


the right side of ( |T()| ) being two-valued in components satisfying xf = 2A. Hard thresholding 
imposes no bias on large inputs, but its discontinuity makes it very unstable when used with 


ADMM [43]. 
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Fig. 1. Plot of several shrinkage functions, all with A = 1. The smaller the value of p, the smaller the bias applied to large 
inputs. Firm thresholding removes the bias completely for large enough inputs, without the discontinuity of hard thresholding. 


Another shrinkage mapping we consider is firm thresholding, a continuous, piecewise-linear 


approximation of hard thresholding. Firm thresholding was first introduced in [44] in connection 
with the WaveShrink procedure for denoising and non-parametric regression. It was not known 
at the time to be the proximal operator of a given penalty function. 

Definition II.3. For A > 0 and p > A, the firm thresholding mapping S firm = S\^ rm is defined 

by *S f firm(*^)i — £firm(| )> where IS defined by 

r 

0 if t < A, 

•Sfirm if) 11 ~jjz\ — A) if A < t < H, (11) 

t if t > fi. 

Note that rm = H\, and lirn^oo firing) = S\,i(x) pointwise. Thus both p-shrinkage 
and firm thresholding can be seen as generalizing both soft and hard thresholding. 


B. Shrinkage-induced penalty functions 

Our motivation for considering alternative shrinkage mappings is to have them as closed- 
form proximal mappings. This requires that the shrinkages actually be the proximal mappings 
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of penalty functions. The following theorem guarantees this. It is proved in [23 Thm. 1], and 
strengthens the earlier result of Antoniadis [27, Prop. 3.2]. 


Theorem II.4. Suppose s : [0, oo) -A M. is continuous, satisfies x < A s(x) = 0 for some 
A > 0, is strictly increasing on [A,oo), and s(x) < x. Define S(x)i = s(|iEi|) sign(iEj), for each 
i. Then S is the proximal mapping of a penalty function G(w ) = )TA g(w t ) where g is even, 
strictly increasing and continuous on [0,oo), differentiable on (0, oo), and nondijferentiable at 
OiffX > 0 (in which case dg( 0) = [—1, l]j. If also x — s(x) is nonincreasing on [A, oo), then g 
is concave on [0, oo) and G satisfies the triangle inequality. 


Both shrinkage and firm thresholding satisfy all hypotheses of the theorem for all parameter 


values. The proof of the theorem constructs g using the Legendre-Fenchel transform [45[ of an 
antiderivative of s. Because of the nature of the Legendre-Fenchel transform, this often does 
not produce a closed-form expression for g. We consider this as an acceptable price to pay for 
having an explicit proximal mapping, which is much more useful for most of today’s state-of- 
the-art algorithms for compressed sensing than having an explicit penalty function. In the case 
of the penalty function G p induced by /^-shrinkage, we can compute g p (w) numerically, and 
example plots are in Fig. [2j In addition to the properties guaranteed by Thm. II.4[ it can be 
shown that lim^oo g p (w) — w p /p — G p — 0 for p f 0 and constant C p depending only on p. 
This includes p < 0, in which case it follows that g p (w) is bounded above. For p = 0, we have 
lim^oo go(w) — login — C = 0 instead. 

In the case of the penalty function Gfi rm induced by firm thresholding, g^ rm does have a closed 
form: 


Qfirrci (ttl) 


| w — vj 2 /(2ii) 

if \w\ 

\X2 

if in 


( 12 ) 


Note that g& rui {w) is independent of A, except that p > A is required by the definition of g fir 


Although the statement of Thm. II.4 excludes hard thresholding (being discontinuous), the 
construction in the proof does produce a penalty function Ghard- It coincides with Gfi rm for 
p — A. The part of the conclusion of the theorem that doesn’t hold is that prox A Gh ar d(A) is 
the entire interval [0, A], while H\(X) is generally defined to take on a single value from this 
interval (namely 0 in our definition ([9])). 
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Fig. 2. Plot of penalty component function g induced by several shrinkage mappings, all with A = 1. The smaller the value of 
p, the slower the growth of the p-shrinkage penalty function, being bounded above when p < 0. Both firm and hard thresholding 
have penalty functions that are quadratic near the origin, then constant. 


C. Example 

To motivate the consideration of p-shrinkage and firm thresholding, we consider a generaliza¬ 
tion of an example appearing in the first compressed sensing paper (1J. We seek to reconstruct 
the 256 x 256 Shepp-Logan phantom image from samples of its 2-D discrete Fourier transform 
(DFT), taken along radial lines, thereby simulating both MRI and X-ray CT data (the latter by 
way of the Fourier slice theorem). See Fig. [3] Since the phantom has a sparse gradient, we seek 
to solve the following optimization problem: 


minG(Vx) subject to Fx = h , 


(13) 


where G is one of the penalty functions being compared, V is a discrete gradient using forward 
differences and periodic boundary conditions, F is the 2-D DFT, and b contains the sample data. 
We solve ([13]) with ADMM, where the shrinkage mapping is p-shrinkage with p < 1 or firm 


thresholding. See [25] for details, being also a straightforward generalization of the algorithm 
of©. 

With G — Gi — || • ||i, 18 lines are required for exact reconstruction, while using G = G_i/ 2 , 


9 lines suffice, as shown in [21], the latter being the fewest that had been demonstrated at that 


time. In [22] (see also [231), 6 lines were shown to suffice using the G induced by a shrinkage 
mapping that is a C°° approximation of hard thresholding. This is the fewest possible, since 
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(a) Shepp-Logan phantom (b) p = 1, 18 lines (c) p = —1/2, 9 lines (d) firm, 6 lines 

Fig. 3. The Shepp-Logan phantom, and the number of radial lines of Fourier samples needed to reconstruct the phantom 
perfectly using different penalty functions. 


with 5 lines, there are fewer measurements than nonzero gradient pixels, so that the phantom 
will not even be a local minimizer of the problem with G = || ■ ||o- However, here we report 
that using G = Gfi rm (with A = 0.1 and // = 2.5), 6 lines also suffice, and many fewer ADMM 
iterations are needed (337 versus 2213). 

While this example is an ideal case, using a very sparse image and noisefree measurements, 
this does demonstrate that p-shrinkage and firm thresholding induce penalty functions that can be 
useful for recovering sparse signals. Now we turn to a theoretical analysis of the sparse recovery 
performance of minimizing these penalty functions. 

III. Exact recovery 

In this section, we establish sufficient conditions for exact recovery of sparse signals from 
noisefree measurements by solving a minimization problem with penalty function G: 

minG(w) subject to Aw = b. (14) 

W 

Our objective is to determine sufficient conditions in the case where G is a penalty function 
induced by a shrinkage mapping; however, we will establish conditions for a somewhat more 
general class of penalty functions G. We shall assume that the measurement matrix A e W nxn has 
the Unique Representation Property (URP), i.e., any m columns of A are linearly independent. 
This implies that any vector in ker(Al) has at least m +1 nonzero entries. The URP can be regarded 
as a generic property of matrices; for example, a matrix whose entries are independently and 
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identically distributed samples drawn from any absolutely continuous probability distribution 
will have URP with probability 1. 

Remark DLL The URP implies that the m rows of A are linearly independent. Thus an orthonor¬ 
mal basis for the span of the rows can be formulated as linear combinations of the rows of A. 
So if we multiply A by a product of elementary matrices, E, corresponding to the necessary 
elementary row operations, the resulting product will have orthonormal rows. Since elementary 
matrices are invertible, Aw = b is equivalent to EAw = Eb. Also, since each elementary matrix 
is invertible, A T being full rank for \T\ = m implies EA T is full rank as well, and so A satisfying 
the URP implies EA satisfies the URP. Thus we can always transform the problem so that the 
rows of A are orthonormal, i.e., AA 1 = /, and so without loss of generality, we assume that 
the A given satisfies AA T = I. 

We shall also assume that G(w) = Yhi9( w i) with 
I) 17(0) — 0, and g even on M; and 

II) g is continuous on M, and either strictly increasing and strictly concave on M, or strictly 
increasing and strictly concave on (0, 7 ] and constant on [ 7 , 00 ) for some 7 > 0. 

These conditions imply that g is nondecreasing and concave on [0, 00 ), is everywhere nonnega¬ 
tive, and satisfies the triangle inequality. 

Lemma III.2. The penalty functions Gy irm and G p ( for —oc < p < 1) satisfy the above 
conditions. 


Proof: It is clear from the expression ( |T2j ) for g firm that G'fj rrn satisfies the conditions with 
7 = fo 

we get condition I, and that g p is differentiable on (0, 00 ) with 


For G p , by Thm. 


11.4 


It suffices to prove that g p is twice differentiable on (0, 00 ) with g” < 0; it will be no more 
difficult to show that g p <E C°°(0, 00 ). We need some details from the construction of g p , from 


123). We have 


9 P (w) = -w 2 /2)/X, 


(15) 


where f' p = s p and f* is the Legendre-Fenchel transform of f p . Since s p is continuous and 
nondecreasing, f p is C 1 and convex. Then by [45, Prop. 11.3], we have that 

x e df*(w) ow = f p {x) = s p (x). (16) 
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Fix w > 0, and let x be such that w = s p (x). From ([8]). we must have x > A, so w = 
x — A 2_p x p_1 . If we define F(x, w) = x — \ 2 ~ p x p ~ 1 — w, we have that F(-,w ) is C°° on (0, oo), 
and | h -j-(x,w) f 0 for x G (A, oo). Thus by the implicit function theorem, f* is C°° on (0, oo), 
hence g p is as well by ( [T5] ). 

Returning to w = x — A 2 ~ p x p_1 , by ([15]), ( fl6| ), and the differentiability of /*, we have 

9 p (w) = ((/;)'H - w)/\ = (A/x) 1 ^. (17) 

Thus g p (w) is decreasing in x on (A, oo), and since x is a strictly increasing function of w on 

(0, oo), g"(w) < 0 on (0, oo). ■ 

Lemma III.3. Assume A G M mxn satisfies the URP and G satisfies (I,II) above. Then the global 
minimizer of © has m or fewer nonzero entries. 

Proof: Consider w such that Aw = b and 11 v: \ \ (l > m. Define the matrix M to have the 
columns —Wiei. The set of vectors Mv with supp(u) C supp(ru) span a subspace of dimension 
greater than m. Since dim(ker(A)) = n — m, we can choose a v with Mv G ker(/l) and 
II || OO 1 • 

For all t G M, w + tMv is feasible. Define T = {i : Vi f 0 and \wi\ < 7 } (taking 7 = +00 if 
the first case of assumption II holds). First suppose T f 0. Then by assumption II, the function 
t (->■ G(w + tMv) is strictly concave on an interval [—5,5], with 5 > 0 chosen small enough 
that every (w + tMv)i has the same sign as wy for all \t\ < 5. Then G(w) > inin{G('«' — 
5Mv), G{v: + SMv)}, and w is not a global minimizer. 

Otherwise, we have Uj f 0 =» \wi\ > 7 . Letf 0 = sup{f : Vz min{|(w— tMv)i\, \ (w+tMv)i\} > 
7 }. Then taking ti = f 0 + 5 with 5 > 0 again small enough that every (wF t\Mv) r has the same 
sign as Wi, then one of |(w ± tiMv)i\ is less than 7 for at least one i, giving a smaller value 
of g. Since all other components keep g constant, we have one of G(w ± tiMv) being smaller 
than G[w). ■ 

Lemma III.4. Assume A G M” ),xn satisfies the URP. Then the magnitudes of nonzero entries of 
vectors y satisfying Aw = b with m or fewer nonzero entries are uniformly bounded below by 
some positive constant a and uniformly above by some positive constant (3. 

Proof: By the URP, every m columns of A can admit no more than one solution. Thus 
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there are no more than (”) vectors w satisfying Aw = b with m or fewer nonzero entries. 
Thus the set of nonzero entries of these vectors is finite and bounded below and above by a, (3 
respectively. Neither constant depends on G in any way. ■ 


Note that Lemma III.3 and Lemma III.4 imply that the global minimizer of the equality- 
constrained G minimization problem has nonzero entries with magnitude bounded below by a 
and above by f3. 

Next we introduce the G Nullspace Property, a generalization of the £ l Nullspace Property 
introduced in (46] for norms and implicitly in [11] for penalty functions belonging to a particular 


class. We denote {1, 2,..., n) = [n], and T c denotes the complement of T in [n]. 


Definition III.5. The G Nullspace Property (or G NSP) of order k for the matrix A is satisfied 
when for all h G ker(A)\{0} and T C [n] with \T\ < k , one has G{h T ) < G{li T c )- 


Proposition III.6. For a penalty function G satisfying the triangle inequality, the G NSP implies 
exact recovery. 

Proof: We simply observe that the proof of eu works assuming only that the penalty 
function satisfies the triangle inequality. ■ 


Definition III.7. Let the matrix A G M mxn and the vector b G M m be given. Let x be the 
sparsest solution to Aw = b, k = ||x|| 0 with 2k < m, and T = supp(x). We say the G Restricted 
Nullspace Property (or G RNSP ) of order k is satisfied if whenever w satisfies Aw = b and 
||u;||o < m, then for h — x — w, we have either h = 0 or G{Iit ) < G(h t=). 


Note that the G NSP of order k for A implies the G RNSP of order k for A. However, 
examining the proof of Proposition |III.6| from pT| and applying Lemma III.3 shows that in fact 
G RNSP suffices for exact recovery. We assume 2k < m to guarantee that the sparsest solution 
of Aw = b is unique, as URP ensures that a second solution must have more than m — k nonzero 
components. 


Proposition III.8. For penalty function G satisfying the triangle inequality, G RNSP implies 
exact recovery. 

Theorem III.9 (G exact recovery). Assume A G M mXTl satisfies the URP and G satisfies (I,II) 
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above. For given b, let x* be the global minimizer of 0 and x the sparsest feasible vector. Let 
k = ||x||o, and define a, (3 to be the lower and upper bound of magnitudes of nonzero entries 
of feasible vectors with m or fewer nonzero components as in Lemma \UL4\ If 2k < m and 
kg(2/3) < {m + 1 — k)g(a) then x* = x. 


Proof: Let h = x* — x. Since x is supported on T, h T c = x* TC , and so for all t G T c , \h(t)\ 
is either zero or at least a. Also, since h e ker(A), if h f 0, then ||/it c ||o > m + l — k (otherwise 
we would have ||/r|| 0 < m, violating URP), so that G{h T =) > (m + 1 — k)g(a). Also, 

G(h T ) < Y,g(\*:\ + 14) ^ M 2 / 5 ) < (m + 1 - k)g(a) (18) 

i£T 

by assumption. Thus either h = 0 or G(Iit) < G(hT<=), so G RNSP is satisfied. ■ 


Corollary III.10 (G firm exact recovery). Assume A e M mxn satisfies URP and G = G\\ rm , the 
penalty corresponding to firm thresholding. For given b, let x* be the global minimizer of 0 
and x the sparsest feasible vector. Let k = ||a:||o. If2k<m and 

(19) 

then x* = x. 


p < mint a 


m + l — k 
~k 


1 + \ 1 — 


Proof: Since A satisfies URP and G satisfies (I,II), we may apply Theorem III.9| The 
inequality conditions from Theorem III.9 are 2k < m and kg{2(3 ) < (m + 1 — k)g(o). We 
know a < 2(3. If we have p < a, then the inequality becomes kg/2 < (m + 1 — k)g/2 which 
follows automatically from 2k < m. And so we satisfy the hypotheses of Theorem |III.9| and 
thus have exact recovery. If instead we have a < p < 2/3, we can evaluate the desired inequality 
as follows: 


< (m + 1 - k)(a - c?/2p), 


^-2^+i 

K 

m+l — k 


k m+1 

—exp H-— 

k 


k 


-or < 0 , 


a- 


m+l — k 


k 


1 - 


— a — 

k 


k 


< a- 


m+1 — k 


1 - 


k 


m + l — k 


< g < a- 


k V m+l — k ’ 
m + l — k 


k 


1 + \ 1 — 


k 


m + l — k 


( 20 ) 

( 21 ) 

( 22 ) 

(23) 
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The left bound is always looser than the assumed a < p (for 2 k < rri + 1), so the condition 
// < a " H / k ^1 + gives the desired inequality and guarantees exact recovery. ■ 

Corollary III.ll (G p exact recovery). Assume A e R mxn satisfies the URP and G = G p , the 
p—shrinkage penalty. For given h, let x* be the global minimizer of ( fT4[ ) and x the sparsest 
feasible vector. Let k = 1111 0 • If 2k < m then there exist A > 0 and 0 < p < 1 sufficiently small 
that x* = x. For any p < 0 there cdso exists A > 0 sufficiently small that x* = x. 


Proof: Since A satisfies the URP and G p satisfies (I,II), we may apply Theorem III.9 The 


inequality conditions from Theorem III.9 are 2k < m and kg [2If) < (rri + 1 — k)g(a). 


Fix w > 0. As in the proof of Lemma III.2 we have 


9 P {w) = (fGw)-w 2 / 2)/A, 


(24) 


where /' = s p and /* is the Legendre-Fenchel transform of f p and is smooth at tv. Let x = 


( f p )\w ), noting that while w is fixed, x depends on A and p. By ( fl6] ), we have s p (x) = w, so 
that 

x — w = \ 2 ~ p x p ~ l . 


Furthermore, by [ [45] Prop. 11.3], we have 

x = argmin(iru; — f p (x)), 

X 

so that by definition of the Legendre-Fenchel transform, 

fp M = XW- f p (x). 


(25) 


(26) 


(27) 


Combining ( |24| ), ( [25] ), and ( [27] ), we obtain 

g p {w) = (xw - f p (x) - w 2 /2)/\ 

= (xw — x 2 /2 + A 2 ~ p x p /p — A 2 (l/p — 1/2) — w 2 / 2)/A 
= A 1 ~ p x p /p — (x — w) 2 / (2A) — A(1 fp — 1/2) 

= - {x/\) p - ^{x/X) 2p - 2 - A(1 fp - 1/2). 

p 2 


(28) 


(29) 


(In ( [28] ), the expression for f p (x) is obtained by antidifferentiating s p with /),(()) = 0.) 
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a) Case 0 < p < 1: We want to show that for sufficiently small 0 < A and 0 < p < 1, 
g(2(3)/g(a) < (m + 1 — k)/k. By hypothesis, (m + 1 — k)/k > 1. So it suffices to show for any 
fixed a, (3 with 0 < a < 2(3, that g(2/3)/g(at) -A 1 as (p, A) —» (0 + ,0 + ). 

By ©, x > w for any A and p, so lim A ^, 0 +(x/A) = oo. Then for p < 1, 

^lhn g p (w) - [(x/X ) p - l] = 0. (30) 

Now 

^[{x/X) p -l] = ^ [exp(plog(x/A)) -1] = j^[plog(x/X) + o(plog(x/X))], (31) 

where the little-o is as plog(x/X) —* 0 + , which we wish to establish as p, A —» 0 + . Since x > w, 
we have that 


p\og(x/\) = p\og(w/x +(x/xy y < p\og(w/x +(w/xy x ) —>■ 0 H 

provided p —y 0 + fast enough, such as if p ~ A 9 for any q > 0. This yields 


lim inf 


9p m 

(a,p)-a(o+,o+) g p (a) 
< 


= lim inf 


X\og(x(2fi) / X) 


(A,p)—>-(o+,o+) A log(x(a)/A) 

log (2/3/A + (2(3/X) p ~ l ) 


lim inf 

(A,p)-K0+,0+) 


(32) 


= l.mmf l0g(2/?) ~ l0g(A) =l. (33) 


log(a/A) a-5-o+ log(a) — log(A) 

Therefore, there exist A > 0,p > 0 sufficiently small that kg(2(3) < (m + 1 — k)g(o). 

b) Case p < 0: Since g p is strictly increasing on [0, oo), we take w oo to determine an 


upper bound. Note that x(w) > w implies that x(w) —> oo as w —> oo. Then from ( |29| ), since 
now p < 0, we obtain 


lim g p (w) — A(l/2 — 1/p). (34) 

w—>oo 


Thus for p < 0 and all w, A, we have g p (w) < A(l/2 — 1/p). Applying this with w 


using (29), 


2(3 and 


l im inf 

a^o+ g p (a) 


< lim inf 

A—>0+ 


A[i(i(o)/A)- 


As before, (x/X) -)• oo as A -> 0 + . Then 


A(l/2 - 1/p) 
\[x{a) /\) 2 v- 2 


(1/p - 1/2)]' 


(35) 


A - io+ g p {a) 


< lim A ^ 2 -^ 
- A—io+ A(l/2 - 1/p) 


1. 


(36) 


Thus for every p < 0 there exists A > 0 sufficiently small that kg(2(3) < (m + 1 — k)g(o). ■ 
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IV. Stability 

Next we consider the case of noisy measurements of an approximately sparse signal. Let x 
be the original signal with \\Ax — b || 2 < e whose fc-sparse approximation is supported on T, i.e. 
xt = argmin^ G(x — w ) subject to ||w||o = k. We wish to bound G(x* — x) where 


x* = argminG'(-ui) subject to \\Aw — b || 2 < e. 


(37) 


We shall bound the recovery error by the sum of a term dependent on the noise level and a term 
dependent on the sparse approximation error. 

We shall first need two results: bounds on the magnitudes of nonzero entries of local minima 


of fl37[ ) and an extension of those bounds to the error vector projected onto the null space of A. 
Recall that 11 tel Loo := min, I wA. 


Lemma IV. 1. Assume A e M mx " satisfies the URP and G satisfies (I,II) above. Let b <E R m 
be given. For S C [n] with |Sj = m define as = ||L^ 1 6||_ 0O and fis = ll^s^lloo- If e < 
min5(o:5/||^4^ 1 ||), then the magnitudes of components of feasible vectors of ( f37| ) are bounded 
below by a := miri 5 (as — HL^He) > 0 and bounded above by f3 := nrax 5 (/L + 


The assumption that as > 0 for all S has a similar character to the URP, in that it is true 
with probability 1 for random data drawn from an absolutely continuous distribution. 

Proof: First, note that the error-bounded problem ( [37] ) is equivalent to taking the G mini- 
mizer from a set of equality-constrained G minimizers (with different equality constraints): For 
all feasible w, we must have Aw = b + p for some ||?/|| 2 < e. Thus by Lemma III.3 the minimizer 


of ( |37| ) has m or fewer nonzero entries. By the URP, any m columns S of A give exactly one 
solution to A s w = b + p. So we have 


M 7 ||_oo = ||yT >s 1 (6 -F 77 )H-oo > min(|L 5 1 6 | - | 

l 

> P 5 1 & ll-°o - ll^'/lloo > as - \\A~s l p || 2 > as - Psle (38) 

> a, 
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and 


HU = \\A s \b + 77 ) 1(00 < P 5 UU + II^^IU 

< As + ||^4 > g- 1 ^7||2 < As + Psle (39) 

<A- 


Lemma IV.2. Assume G satisfies (I,II). Let x* be the global minimizer of ( |37| ), x the original 
signal with || Ax — b\\ < e, and let T be the support of the k-sparse approximation of x. Let 
as, a, and (3 be as in Lemma IV. 1 Define a' := a — ||xt c ||oo — 2e and f3' := (3 + e. If A 
satisfies the URP, A A' = I, mins as > \ \ XT' \ j oc (requiring that x be nearly k sparse), and 
e < mins{(as — ||xt c IU)/(2 + PHH)} , then the orthogonal projection w of h = x* — x onto 
the nullspace of A satisfies 


a' < \\wtc H-oo and ||7Hr c ||oo < 2 (3'. 


(40) 


Proof: First, consider the bound e < min s {(a;s — ||rr r c||oo)/(2 + ||A^ 1 ||)}. Note that this is 
stronger than the bound on e from Lemma [iV. 1 , and it implies 2e + 11 xr^ \\ oc < 01 . We see this 
from the following inequalities: 


a = minjas — eKA ^ 1 
s 


> mm{oi 5 - (a s - lUHIcUPsi/U + P 5 1 ||)} 


= mm 


l 


2 a s 


s ( 2 + p 5 


-n 


+ 


U 


-ii 


\Xtc 


. \2a s -2\\x T c\ 
= mm< --- ^-7 

s \ 2 + P s || 

> 2e + ||xt c IU- 

We shall use this below to guarantee a' > 0. 


2 + ||A S 

( 2 - 

+ 


-11 


loo , (2 + p s 


-11 


I Xtc 


2 + P 


-ii 


(41) 


Note that the hypotheses of Lemma IV. 1 are satisfied, giving ||o 7 *|| —00 > ol and ||x*||oo < (3, 
||a;||oo < /3- Since AA 1 = /, the orthogonal projection of h onto the nullspace of A is ( I—A T A)h. 
The desired lower bound comes from the following sequence of inequalities, using the given 
lower bound on nonzero elements of x *, the feasibility of x* and x, the fact ||A T A|| = 1 , and 
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the assumed bound on e: 

II [(^ — A T A)h] T c ||_oo > ||hfc 11 —oo — 11 Af A.h 11oo 

> \\x* T c - x T c ll-oo - P T ^|| 2 

> II^T-ll-oo - Halloo - ll^lh 

> a — ||ict c 11oo — 2e = a' > 0. 

The upper bound comes from a completely analogous argument: 

l\(I-A T A)h\\ 00 <\\h\\ 00 +\\A T Ah\\ 00 

< ||x* - xlloo + ||A t A/i || 2 

< 2/7 + 2e = 2/3'. 


Definition IV.3. The G Noisy Nullspace Property (or G NNSP ) of order k for the matrix A is 
satisfied when for all /i6l" and S C [n] with ,5'| < k , there are constants 0 < r < 1 and 
D > 0 such that 

G(h s )<rG(h S c) + D\\Ah\\ 2 . (42) 


Proposition IV.4. Assume G satisfies the triangle inequality. For given A, b, let x* be the 


global minimizer of ( [37] ) and let x be the original signal with ||Ax — b || 2 < e whose k-sparse 
approximation is supported on T. Then the G NNSP of order k for A implies the following 
stability bound: 


G(x* -x) < C x e + C 2 G(x tc ) 


(43) 


with C i = 477/(1 — r) and C 2 = 2(1 + r)/( 1 — t), where r and D satisfy (42). 


Proof: Define the error vector h = x* — x. Since x* and x are both feasible and 11 /I | = 1, 
\\Ah\\ 2 < 2e. Then by the triangle inequality of G, 

Gl^xjk) — (j(— hx) G G{xt + hx)- (44) 

Since G decouples across components, 

G{xx + hx) + G(hxc) = G(xx + hx + hxfi = G(x* — xxfi- (45) 
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Then 


G(h T c) < G(x* - x T fi + G(h T ) - G(x t ) 

< G{x*) + G(x r =) + G(h T ) - G{x t ) 

< G{x) + G{x T c ) + G{hr) - G{x T ) 

= 2G(x T c) + G{h T ). 

Now apply G NNSP to h on T: 

G{hx ) < rG{h T c ) + D\\Ah\\ 2 < 2 tG(xtc) + rG{h T ) + 2 De, 


so that 


G(h T ) < - — -(D€ + tG(xtc)). 


Using (46), we obtain 


G(hr c) < 2G(xtc) + G(h T ) < —- + ~^—G(x T c). 

1 — T 1 — T 


Now we add (48) and (49) to get the desired inequality: 


G(h) = G(hr) + G(h T «=) 
2 


< 


1 — T 


2/} 9 

(yDe + tG(xtc)) + - -e + -- G(xt^) 


1 — T 1 — T 


4D 2(1 + t) . , 

-e + —- -G(x t =)- 


1 — T 


1 — T 


(46) 


(47) 


(48) 


(49) 


(50) 


Theorem IV.5 (G stability). Assume A e M r " xn satisfies the URP, AA T — I, G satisfies (I,II) 
above, and G(v) < Cs/ri\\v\\o for some constant C > 0. For given h, let x be the original signal 
with 11 Ax —11 2 < e, let T be the support of its k-sparse approximation, and suppose ruing {at,'} > 
||xt=||oo- Get x* be the global minimizer of ( [37] ), where e < min 5 {(Q: s - ||xt c ||oo)/(2+ ||A S 1 ||)} 


(with as defined as in Lemma IV. 1 ). Define a! , j3' as in Lemma IV.2 Assume that 2k < n and 
kg(2(3') < (n — k)g(a'). Then 


G(x* - x) <21- 


kgim 

(n - k)g(a') 


-i 


2C^fne + ( 1 + 


kg(2(5’) 


(n - k)g(a') 


G(xtc 


(51) 
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Proof: We shall show that the given hypotheses allow for the same application of the G 
NNSP as in Proposition IV.4[ and in a similar way, arrive at stability. Define h = x* — x. Since 
G satisfies the triangle inequality, we have G(h T c) < G(h T ) + 2 G(x T c), as in the proof of 
Proposition |IV.4| 

Next we write h as the sum of its orthogonal projections onto ker(A) and ker(yl) ± , which 
we denote by w and v respectively. First, suppose that there exists some 0 < r < 1 such that 
G{wt) < tG(wt<=) (which we will prove below). Then we have: 

G(hr) < G(wt) + G(vr) 

^ tG{wj' c ) T G(v j 1 ) — tG^wj'c t vj'c — Vj'c'j T G(vj3j 
^ TG{hro) + G{vt°) T G(vt) 

= rG(h T c) + G(v) 

< tG(H t c ) + C^/n\\v\\2. ( 52 ) 

Since AA T = I and v G ker(A)- 1 -, it follows that v = A T Av. Hence j|u||| = ||y4u|||. Then from 


(52) we obtain 


G(h T ) < rG(h T c) + Cy/n\\Av\\ 2 . 


(53) 


And so we have the application of the G NNSP to h on T with constants r and D = Cyfn. 
From here the stability inequality (|5T|) follows as in Proposition |IV.4 


Now we go back to prove G(wt ) < tG(wt<=)■ We shall use the lower bound ||iut c ||-oo > ol 
and the upper bound ||uit||oo < & from Lcmma [~IV.2 We overestimate G(wt) and underestimate 
G(wt<-) as follows: 


G(w t ) < kg(2/3'), G(w T =) > (n - k)g(a') 


(54) 


So to get G{wt) < tG(wtc), it suffices to have kg(2/3') < r(n — k)g(a'), and thus kg(2/3') < 
(■n — k)g(a') guarantees some 0 < r < 1. The condition k < n — k gives (n — k)/k > 1 and 
thus makes the inequality possible for a' < 2/3'. 

Plugging in r = to the stability inequality we get from the previous argument gives 


G{h) <21 


kg{2F) 


(n - k)g(a') 


-1 r 


2 C\fne + ( 1 + 


kg{ 2(3') 


(n - k)g(a') 


G(x T c) 


(55) 
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Corollary IV.6 (Gfi rm stability). Assume A G M mxn satisfies the URP, AA T = I, and G = 
G'firm, the penalty corresponding to firm thresholding. For given h, let x be the original signal 
with 11 Ax — 6 j| 2 < e whose k-sparse approximation is supported on T, with mins {as} > 

11 | |oo. and x* be the global minimizer of ( [37] ), where e < mins{(as- lkT-||oo)/(2+P s 1 ||)} 

(with as defined as in Lemma \IV. 1 1 . Define a' , j3' as in Lemma IV.2\ If 2k < n and f t < 
minja' 23 ^ ^1 + j . 2d'} then x* is stable, satisfying the following inequality: 


G &im (x*-x)< 2 1 


kg &rm (2/3') 

(n k)(j\\ vlu (a t 


-i 


2 C a fne + ( 1 + 


fcgfirm(2/y) 

(n - k)g^ rm (a') 


Gfa m (xTc) 


(56) 


The proof of Corollary |IV. 6 | is an application of Theorem |IV.5| combined with the corresponding 
computations from the proof of Corollary |III.10[ 


Corollary IV.7 (G p stability). Assume A G 


satisfies the URP, AA T = I, and G = G p , the 


penalty corresponding p-shrinkage. For given b, let x be the original signal with || Ax — 611 2 < e 
whose k-sparse approximation is supported on T, with mills{ 0 : 5 } > ||^t c ||oo» cind x* be the 
global minimizer of m where e < miris{(as — \\xt^\\oo)/ if + ll^s 1 !!)} (with as defined as in 


Lemma IV. 1 ). If 2k < n then there exist ()< [)< 1, 0 < A sufficiently small so that x* is stable, 


satisfying the following inequality. 

kg P ( 2/3') 


G p (x* — x) <21 


(n - k)g p {a') 


-1 


2C yfine + ( 1 + 


kg v i 2f) 


G(xtc 


(57) 


(n ~ k)g p {a') 

Also, for any p < 0 there exists A > 0 sufficiently small such that x* is stable, and the above 
inequality holds. 


The proof of Corollary |IV.7| is an application of Theorem |IV. 5 [ combined with the corresponding 
computations from the proof of Corollary |III. 11 [ 


V. Convergence of iterative ^-shrinkage 

Now we consider an algorithm that employs generalized shrinkage. Consider the following 
optimization problem: 

min F p {x) := A G p {x) + \\\Ax — £>|||, (58) 
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where ||A|| < 1. Applying forward-backward splitting to this problem gives iterative p-shrinkage 
(IPS): 

x n+1 = S p (x n - A T {Ax n -b)). (59) 


This generalizes the iterative soft thresholding algorithm (ISTA) [311, which is the case p = 1. 


ISTA was shown in [311 to be globally convergent to a global minimizer (necessarily, since 
F\ is convex). In this section, we prove global convergence of IPS for general p < 1, though 


only to a stationary point of F p . Portions of the proof appeared in [28], though statements there 
concerning convergence to a local minimizer are incorrect. 


Recall from Lemma III.2 that g p is C°° on (0, oo). A closer examination of the proof shows 
that g p on [0, oo) is the restriction of a function that is C°° on M, so g p is one-sided differentiable 
to all orders at w = 0. 


The following follows exactly as in the known case of p = 1 [311: 


Lemma V.l ( [28]). Let A > 0 and pel, and define {x n } by (|59|), with x° arbitrary. 


1) F(x n+l ) < F(x n ) for all n, and F(x n+1 ) < F(x n ) unless x n is a fixed point of the 
algorithm. 

2 ) \\x n+1 -x n \\ 2 -A 0. 


Lemma V.2. Let A > 0 and pel The fixed points of ( [59] ) are precisely the stationary points 
of F p . 


Proof: The iteration ( [59] ) can be seen as minimizing the surrogate functional 

A G p (x) + \\\Ax — fc ||2 + \\\x » w ||2 — \\\Ax — Aw\\l 


(60) 


with fixed w = x n , by expanding the quadratic terms and rearranging to express the minimizer 
in terms of the proximal mapping of G p . Therefore the first-order optimality condition of this 
functional is satisfied at x = x n+l . Also, the first-order optimality condition of this functional 
at x = x n is the same as the first-order optimality condition of F v at x = x n . Hence x n+1 = x n 
if and only if the first-order optimality condition of F p at x = x n is satisfied. ■ 

The lemma shows why it is not possible to show that IPS converges to a local minimizer: if 
the algorithm happens to be initialized with a stationary point that is not a local minimizer (i.e., 
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a saddle point or local maximizer), then the initializer is a fixed point of the algorithm, so the 
algorithm cannot converge to a local minimizer in such a case. 

Lemma V.3. Fix X > 0, p £ (— 00 ,1). We have g'” > 0 on (0, 00 ), g p < 0 on (— 00 , 0), 
9 p( 0 +) > 0, and g'”{ 0 -) < 0. 

Proof: Since g p is even, it suffices to consider w > 0. Above we had that x = x{w) = 
( fp)'( w ) satisfies x — X 2 ~ p x p ~ 1 = w. Differentiating with respect to w, we have that 

x' — X 2 ~ p (p — l)x p ~ 2 x ' = 1, (61) 


so 


x' — (l — A 2 P (p — l)x p 2 ) 1 . 


(62) 


Since p < 1, ( f*)"{w ) = x'(u;) > 0 for all w > 0. 

Differentiating ( | 6 T[ ), we get 


or 


x" — A 2 P (p — 1) [(p — 2)x p \x'f + x p 2 x"] = 0, 


x"{l — X 2 P (p — l)x p 2 ) = A 2 P (p — 1 )(p — 2)x p 3 (x') 2 , 


(63) 


(64) 


implying that x" has the same sign as x. Since x(w) has the same sign as w, we have that 
() has the same sign as w for w 0 . 

Differentiating the relation ( p~5| ) defining g p , we obtain w + Xg' p (w ) = (f*)'(w), 1 + Xg”(iv) = 
and A g'”{w) = ( f*)"'(w). Thus g'”{w ) has the same sign as w for w 7 ^ 0 as well. 


Also, Xg'”(0+) = (/*)"'(0+) = lim,„^o+ x"{w). Since lim„.^ 0 + x(w) = A, we obtain from ( |62l ) 
and d64] ) that 0+) = ^-p ) 2 ^ > Thus (? /,/ (0+) >0. ■ 

Lemma V.4. Let p > 0. Then {x 11 } is bounded. 

Proof: Since {F p (x n )} decreases monotonically, it suffices to show that F p is coercive, 


which we establish be showing coercivity of g p . By ( [25] ), if w —» 00 , then x —> 00 . For p > 0, 
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that g p [w) — » oo follows from ((29]). The p — 0 case is similar, but / 0 has a different form: 

g 0 (w) = [xw - f 0 (x ) - w 2 /2)/X 

= [xw — x 2 /2 + A 2 logic — A 2 (log A — 1/2) — w 2 / 2)/A 
= A logic — (x — w) 2 /{2X) — A(logA — 1/2) 


= A logic — f(x/A) 2 — A(log A — 1/2). 


(65) 


From this the coercivity of g 0 follows. 


Lemma V.5. Let p < 0, and assume A 2 > pH^Hl/G 9 — 2). Let x° = 0. Then {x n } is bounded. 


Proof: From Lemma V.l we know that F p {x n ) decreases (strictly except at a fixed point, 


in which case we are done). Then for n > 1, 

F P (x n ) < F p (x°) = ||6|| 2 /2, 


( 66 ) 


so 


G P [x n ) < F p (x n )/X < ||fe|| 2 /(2A). 


(67) 


By ( [34] ), g p (w ) < (1/2 — l/p)X. Combining this bound with ( [67] ), we obtain for each j. 


g p [x]) < G p [x n ) < \\b\\l/(2X) < (1/2 - l/p)X. (68) 

Letting t be the unique positive number satisfying g(t ) = ||6|||/(2A), we obtain < t 

independently of n. ■ 

Now we can establish convergence of our algorithm. 


Theorem V.6. Let X > 0, p e (—oo, 1). Let the sequence { x n } be defined by ( [59] ), with x° 
arbitrary for p > 0, and x° = 0 for p < 0 in which case we further assume X 2 >p\\b\\ 2 /(p-2). 
Then {.c" } converges to a stationary point of F. 

Proof: We have that F p (x n+1 ) < F p [x n ) unless x n is a fixed point, F is continuous, and the 
sequence {x n } is bounded. Then by [47, Thm. 3.1], we have that either } converges or its 
limit points form a continuum. (A continuum is a compact, connected set; here we also exclude 
the degenerate case of a singleton.) Since we already know that any limit point of {x n } will 
be a stationary point of F p , we complete the proof by showing that the stationary points of F p 
cannot form a continuum. 
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Let E be the set of stationary points of F p , and suppose E is a continuum. Fix x G E. For 
any e > 0, it cannot be that A f(x\ e) fl E — {x}, otherwise {x} would be both open and closed 
in E, contrary to E being connected. Thus there is a sequence of stationary points x + v n with 

v n 7 ^ 0 , v n —* 0 . 

Since {x n /||x n ||} is a sequence of unit vectors, it cannot converge to zero. Then we can fix j 
such that {v™/\\v n \\} does not tend to zero, though of course v™ —* 0. First suppose that x 3 ^ 0. 
By considering a tail of x”, we can assume that x ? + x” ^ 0 for all n. Then g p is differentiable 
at Xj and Xj + x”, and since x and x + v n are fixed points, 

A 2 - p g' p (xj + x?) + [A T (A(x + v n ) - b)\. = 0 (69) 

and 

X 2 - p 9 , p (x j )+[A T (Ax-b)]. = 0. (70) 

Define <p(x) = A g p (xj) + [A T (Ax — b)] . All derivatives of ip exist at every x/0. Letting 
(di) denote the columns of A, if i ^ j , we have dp/dxi{x) = {a^aj), while dip/dxj{x) = 
A g"{xj) + \\ajW 2 . Also, p(x) = 0 and each <p(x + v") = 0. By differentiability of p, we have 

p(x + v n ) - <p(x) - Vy?(x) • v n _^ Q (71) 

\\v n \\ 

Since the first two terms of CD are zero, Vp(x) ■ v n = o(||x n ||) as well. By continuity of \7p 
at x, it is straightforward to show that Vp(x + v'") ■ v n = o(||u n ||) also. 

Now we consider second derivatives. d 2 p/dxidx k (x) = 0, unless i = k = j, while d 2 p/dx 2 (x) 
A g p (xj). Now by the differentiability of Vp, 

\\Wp{x + v n ) - Vv?(x) - V 2 p(x) v n \\ = o(|K||), (72) 

so 

Vp(x + v n ) • v n - Vy?(x) • v n - v n ■ VV(x) v n = o(||x n || 2 ). (73) 

But from the above we have that the first two terms are o(||x n || 2 ), so v n ■ V 2 p(x) v n = o(||u ri || 2 ) 
as well. But this is A g p {x^iy 1 -) 2 ', since (u”) 2 /||u n || 2 does not tend to zero by choice of j, it 
must be that <(%) = 0 , a contradiction. 

Thus we must have Xj = 0. By choice of j, infinitely many x" ^ 0, so by passing to a 
subsequence we may assume that either all v™ > 0 or v'- < 0. By the one-sided differentiability 
of g p , we can then repeat the above argument using a smooth extension of g p to M. Since neither 
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(/" (0+) nor (j”' (0—) are zero, we will obtain the same contradiction. Therefore E cannot be a 
continuum, and the sequence {x n } defined by (|59|) is convergent to a stationary point of F p . ■ 


VI. Conclusion 

We have shown that for given signals with reasonable sparsity assumptions and a broad 
class of measurement matrices, the families of penalties corresponding to p-shrinkage and firm 
thresholding, like the i v quasinorms, provide a candidate penalty that is able to exactly recover 
the given data with the given measurement matrix. Further we have shown that these penalties 
behave well with respect to the addition of noise in the measurements, or only approximately 
sparse signals (as is often the case in practical settings). Finally, we have shown that iterative 
p-shrinkage converges to stationary points of the unconstrained energy. These results, together 
with empirical results (see [231, and Fig. [3]), further support the idea that generalized shrinkage 
penalties can be an advantageous alternative to standard (' compressed sensing, or P‘ compressed 
sensing. 

Further work could benefit from exploring in what generality these type of results hold. The 
theory of generalized shrinkage allows for an endless possibility of other shrinkages and penalties 
to study. Additionally, the methods of proof may apply to compressed sensing relaxations that 
arise in other ways. Generally speaking, determining conditions under which convex optimization 
results can be extended to handle nonconvex functionals may continue to be a fruitful area of 
research. Lastly, we make no claims that the approximations made in these proofs give the 
tightest results possible, so further refinement of these results may be possible and interesting. 
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